My study is testing the theory that bees prefer to frequent the same flowers, as opposed to visiting multiple different flowers in a patch. Sampling was done in Sweden with reared colonies of bees in an open fied. Bees were individually marked and ID’d and left to pollinate for 5 days at a time. After each period pollen was removed from the bees bodies and collected in jars. Mahcine learning was used to identify pollen samples from specific flowers which were identified in the field. After the program was run, individual pollen samples from ID’d bees were added to a csv file, noting each bee and where their pollen came from, and how much (in g) pollen they had extracted from the flower per trip.
I will be looking for you to clearly take your reader through all of the elements of data manipulation, analysis, and, where appropriate, visualization. You should provide as much coding detail, explanation, and output tables as necessary to compare your results to those published.
Annotate your code well, and good luck!
First we must identify and name all of the relevant columns within the csv. Here I am calling the columns of flower species that pollen was taken from, and giving them an easier and more memorable name to work with throughout the replication assignment.
Now, in order to replicate our first figure, the mean weight (in grams) of each pollen type needs to be calculated. By using the mean() function I am directly calculating the averages of each column that I assigned in the previous step.
mean(Prunus)
## [1] 0.1921359
mean(Brassicaceae)
## [1] 0.1278986
mean(Lamium)
## [1] 0.09972985
mean(Acer)
## [1] 0.1018838
mean(Lonicera)
## [1] 0.07003503
mean(Salix)
## [1] 0.06957033
mean(Prunus_Group_B)
## [1] 0.04282827
mean(Papaver)
## [1] 0.04312855
mean(Cytisus)
## [1] 0.03175761
mean(Trifolium_repens)
## [1] 0.02704338
mean(Aesculus)
## [1] 0.02084907
mean(Pulmonaria_Group)
## [1] 0.01453764
mean(Laburnum)
## [1] 0.01478939
mean(Phacelia)
## [1] 0.01422394
Now we are going to start plotting our figure. First I am setting a character vecotr “means” that is comprised of all of the means calculated in the first step. This will allow me to call my “means” easily and quickly when I code my bar chart. Next I’m setting a character vector of flower species “Flower_species” for the same reason. This will allow me to call all of the species on the x-axis using the names.arg() argument later on. Finally I am importing a package to allow me to assign colors to my bars to distinguish species visually.
#setting a common numerical value to pull from in the y axis
means <- c(mean(Prunus), mean(Brassicaceae), mean(Lamium), mean(Acer), mean(Lonicera), mean(Salix), mean(Prunus_Group_B), mean(Papaver), mean(Cytisus), mean(Trifolium_repens), mean(Aesculus), mean(Pulmonaria_Group), mean(Laburnum), mean(Phacelia))
#Now using a character vector to pull names of each group on the x-axis
Flower_species <- c("Prunus", "Brassicaceae", "Lamium", "Acer", "Lonicera", "Salix", "Prunus_Group_B", "Papaver", "Cytisus", "Trifolium_repens", "Aesculus", "Pulmonaria_Group", "Laburnum", "Phacelia")
library(RColorBrewer)
coul <- brewer.pal(14, "Set3")
## Warning in brewer.pal(14, "Set3"): n too large, allowed maximum for palette Set3 is 12
## Returning the palette you asked for with that many colors
#This sets rainbow colors for my graph
par(mar=c(8,4,4,1)+.1)
#This changes the margins on my graph, the first one deals with the bottom allowing me to fit all the words onto the chart
barplot(means, ylim = c(0.00, 0.20), names.arg = Flower_species, las = 2, space = 0.05, xlab = " ", ylab = "Pollen Proportion", main = "Pollen Proportion vs. Flower Species", col=coul, cex.axis=0.9)
# las = 2 moves the x-axis labels vertical